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Hightlights 

A prediction method of coronaviral 3CLpro cleavage sites was proposed to 
balance the accuracy and false positives. 

3 of the 9 putative non-canonical cleavage sites were verified, which are located 
upstream to nsp4. 

All 11 canonical cleavage sites of MERS-CoV 3CLpro were confirmed and the 
Michaelis constants were calculated. 
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Abstract 

Coronavirus 3C-like protease (3CLpro) is responsible for the cleavage of coronaviral 
polyprotein la/lab (ppla/lab) to produce the mature non-structural proteins (nsps) of 
nsp4-16. The nsp5 of the newly emerging Middle East respiratory syndrome 
coronavirus (MERS-CoV) was identified as 3CLpro and its canonical cleavage sites 
(between nsps) were predicted based on sequence alignment, but the cleavability of 
these cleavage sites remains to be experimentally confirmed and putative 
non-canonical cleavage sites (inside one nsp) within the ppla/lab awaits further 
analysis. Here, we proposed a method for predicting coronaviral 3CLpro cleavage 
sites which balances the prediction accuracy and false positive outcomes. By applying 
this method to MERS-CoV, the 11 canonical cleavage sites were readily identified and 
verified by the biochemical assays. The Michaelis constant of the canonical cleavage 
sites of MERS-CoV showed that the substrate specificity of MERS-CoV 3CLpro is 
relatively conserved. Interestingly, 9 putative non-canonical cleavage sites were 
predicted and three of them could be cleaved by MERS-CoV nsp5. These results pave 
the way for identification and functional characterization of new nsp products of 
coronaviruses. 

Keywords: MERS-CoV; 3C-like protease; Canonical cleavage sites; Non-canonical 
cleavage sites; Michaelis constants. 
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48 Introduction 

49 Middle East respiratory syndrome coronavirus (MERS-CoV) is an enveloped virus 

50 carrying a genome of positive-sense RNA (+ssRNA). It was identified as the pathogen 

51 of a new viral respiratory disease outbreak in Saudi Arabia in June 2012, named as 

52 Middle East Respiratory Syndrome (MERS). MERS-CoV emerged ten years after 

53 severe acute respiratory syndrome coronavirus (SARS-CoV) (Zaki et al., 2012) and 

54 quickly spread to several countries in Middle East and Europe (Assiri et ah, 2013; 

55 Tashani et al., 2014). Soon after the first report, the MERS-CoV genome was 

56 sequenced and its genomic organization has been elucidated (van Boheemen et al., 

57 2012). This new coronavirus is classified in the lineage C of beta coronavirus, and is 

58 close to bat coronavirus HKU4 and HKU5 (de Groot et al., 2013; Lau et al., 2013). 

59 Like other coronaviruses (Hussain et al., 2005; Zuniga et al., 2004), MERS-CoV 

60 contains a 3' coterminal, nested set of seven subgenomic RNAs (sgRNAs), enabling 

61 translation of at least 9 open reading frames (ORFs). The 5'-terminal two thirds of 

62 MERS-CoV genome contains a large open reading frame ORFlab, which encodes 

63 polyprotein la (ppla, 4391 amino acids) and polyprotein lab (pplab, 7078 amino 

64 acids), the latter being translated via a -1 ribosomal frameshifting at the end of ORFla. 

65 These two polyproteins were predicted to be subsequently processed into 16 

66 non-structural proteins (nsps) by nsp3, a papain-like protease (PLpro), and nsp5, a 

67 3C-like protease (3CLpro) (Kilianski et al., 2013; van Boheemen et al., 2012). 

68 

69 Protease plays a key role during virus life cycle. It is essential for viral replication by 

70 mediating the maturation of viral replicases and thus becomes the target of potential 

71 antiviral drugs (Thiel et al., 2003; Ziebuhr et al., 2000). Investigating the cleavage 

72 sites of coronavirus proteases and the processing of polyproteins ppla/lab will benefit 

73 to identify the viral proteins and their potential function for viral replication. Some 

74 cleavage sites have been identified and confirmed by previous studies, including three 

75 cleavage sites of PLpros of human coronavirus 229E (HCoV 229E), mouse hepatitis 

76 virus (MHV), SARS-CoV, MERS-CoV and infectious bronchitis virus (IBV), whose 
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cleavages release the first 3 non-structural proteins (Bonilla et al., 1995; Kilianski et 
al., 2013; Lim and Liu, 1998; Ziebuhr et ah, 2007). The canonical cleavage sites of 
3CLpros, the sites between the recognized nsps, have also been characterized, 
including all sites of MHV, IBV, SARS-CoV and a fraction of sites of HCoV 229E 
which release the non-structural proteins from nsp4 to nspl6 (Deming et ah, 2007; 
Grotzinger et ah, 1996; Liu et ah, 1994; Liu et ah, 1997; Lu et ah, 1995). For 3CLpro 
of MERS-CoV, two cleavage sites releasing nsp4 to nsp6 have been identified 
(Kilianski et ah, 2013). However, other cleavage sites remain to be characterized. 

Furthermore, efforts have been taken to predict these cleavages sites by sequence 
comparison. Gorbalenya et. al made the first systematical prediction on IBV 
ppla/lab according to the substrate specificity of 3C protease of picornaviruses 
(Gorbalenya et ah, 1989). However, two of their predicted cleavage sites within nsp6 
of IBV were proved uncleavable (Liu et ah, 1997; Ng and Liu, 2000). Gao et. al. 
developed a software (ZCURVECoV) to predict the nsps as well as gene-encoded 
ORFs of coronaviruses more accurately based on previous studies of 3CLpros 
cleavage sites of IBV, MHV and HCoV 229E (Gao et ah, 2003). Later on, 
non-orthogonal decision trees were used to mine the coronavirus protease cleavage 
data and to improve the sensitivity and accuracy of prediction (Yang, 2005). However, 
while these methods focus on the prediction of the canonical cleavage sites and target 
more and more on prediction accuracy to avoid false positives, potential 
non-canonical cleavage sites might be neglected. For example, a cleavage site 
between nsp7-8 of MHV strain A59 is not predicted by above methods, but proved to 
be physiologically important since it produces a shorter nsp7 that can support the 
growth of MHV carrying a mutation on nsp7-8 cleavage site (Deming et ah, 2007). 
Therefore, the substrate specificities of coronaviruses 3CLpros are complicated. A 
3CLpro substrate library of four coronaviruses (HCoV-NL63, HCoV-OC43, 
SARS-CoV and IBV) containing 19 amino acids x 8 positions variants was 
constructed by making single amino acid (aa) substitution at each position from P5 to 
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P3', and their cleavage efficiencies were measured and analyzed to find out the most 
preferred residues at each position (Chuck et ah, 2011). However, the non-canonical 
cleavage site with less preferred residues of 3CLpro is adopted by coronaviruses 
(Deming et ah, 2007). Thus we speculate that other potential 3CLpro cleavage sites 
may still exist in coronaviruses. 

In order to set up a more moderate and balanced criteria for protease cleavage site 
identification, we compared 6 scanning conditions with different stringency to 
systematically predict the 3CLpro cleavage sites on ppla/lab of 5 coronaviruses 
including MERS-CoV. As a representative, the cleavability of the predicted cleavage 
sites of MERS-CoV 3CLpro was analyzed by the recombinant luciferase cleavage 
assay and the fluorescence resonance energy transfer (FRET) assay. The results 
showed that all 11 canonical cleavage sites of MERS-CoV ppla/lab were cleavable in 
our experiments and 3 of 9 predicted non-canonical cleavage sites appeared to be 
cleavable. Our study points out a new direction regarding the prediction and 
identification of cleavage sites of proteases and contributes to understanding the 
mechanism of coronaviral polyprotein processing. 

Materials and Methods 

Information collection of coronavirus 3CLpro cleavage sites. The genome 
sequences of 28 coronaviruses were downloaded from Genebank database and the 
sequences of the 3CLpro cleavage sites were collected from P4 to P2' (Table SI to 
Table S4). The substrate profdes of each coronavirus group and the whole 
Coronavirinae were summarized (Table S5). 

Construction of recombinant 3CLpro expression vectors. The coding sequence of 
MERS-CoV nsp5 (NC_019843) was synthesized chemically by GenScript and cloned 
into vectors pET28a and pGEX-6p-l, respectively. The catalytic residue mutation 
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134 Cl48A was generated by over lapping PCR with mutagenic primers (Table S6). All 

135 the clones and mutations were confirmed by DNA sequencing. 

136 

137 Expression and purification of recombinant proteins. The expression vectors were 

138 transformed into Escherichia coli strain BL21 (DE3). The cells were grown at 37°C in 

139 Lysogeny broth (LB) medium with antibiotics and induced with 0.2 mM 

140 isopropylb-D-thiogalactopyranoside (IPTG) at 16°C for 12 hours. The cells were 

141 harvested and resuspended in lysis buffer (50 mM Tris-HCl, pH 7.5, 150 mM NaCl, 1 

142 mM EDTA, 0.05% NP40, 0.1 mg/ml lysozyme and 1 mM PMSF) at 4°C. After 

143 incubation for 30 min on ice, 10 mM MgCL and 10 pg/ml DNase I (Sigma) were 

144 added to digest the genomic DNA. The supernatant of cell lysate was applied to 

145 affinity chromatography column after centrifugation. The recombinant protein with 

146 His-tag was bound with nickel-nitrilotriacetic acid (Ni-NTA) resin (GenScript) and 

147 washed with buffer A (50 mM Tris-HCl, pH 7.5, 150 mM NaCl), buffer B (50 mM 

148 Tris-HCl, pH 7.5, 150 mM NaCl, 20 mM imidazole) and buffer C (50 mM Tris, PH 

149 7 . 5 , 150 mM NaCl, 50 mM imidazole). Proteins were eluted with buffer D (50 mM 

150 Tris, PH 7.5, 150 mM NaCl, 250 mM imidazole). GST-tagged protein was bound with 

151 GST resin (GenScript), washed with buffer A and eluted with buffer A supplemented 

152 with 10 mM reduced glutathione (GSH). The purified proteins were desalted and 

153 concentrated by ultrafdtration using 30 kD amicon ultra 0.5-rnl centrifugal filter 

154 (Millipore). 

155 

156 Luciferase-based biosensor assay. All the cleavage sites (8 residues, ranging from 

157 P5 to P3') were inserted into Glo-Sensor 10L linear vector. Comparing to the wild 

158 type firefly luciferase (550 aa), Glo-Sensor luciferase has short truncations at both 

159 termini with C- and N-part reversed, resulting in the new 234-aa N- and 233-aa 

160 C-terminal region respectively. The inserted sequence and the reversed arrangement 

161 of the N- and C-terminal regions reduce the luciferase activity dramatically. After the 

162 recognition sequence was cut off by nsp5, the luciferase recover its activity and 
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luminescence in the presence of luciferase substrate. A back to front recombinant 
firefly luciferase inserted with different cleavage sites was expressed when the 
recombinant plasmids were co-incubated with a cell-free protein expression system 
extracted from wheat germ (Promega). After incubation for 2 hours at 25°C, nsp5 was 
added into the system and the whole system was incubated at 30°C for 1 hour. Then, 
the reaction system was diluted 20 times and mixed thoroughly with equal volume of 
luciferase substrate. Luciferase luminescence was measured by a luminometer 
(Promega) after incubation for 5 min at room temperature. 

Peptide-based FRET assay. All the 11 conserved putative recognition sites were 
designed from P12 to P8', synthesized and modified with a typical shorter wavelength 
FRET pair, N-terminal DABCYL and C-terminal Glu-EDANS by GL Biochem 
(Shanghai). The peptides were completely dissolved in DMSO and the final 
concentration of DMSO in the reaction system was 1%. 180 pM substrate peptide and 
16.3 pM tagged nsp5 were mixed in the solution of 50 mM Tris, ph 7.5, 1 mM EDTA, 
50 pM DTT and incubated at 37 °C for 2 hours. To calculate kcat/Km, different 
amounts (7.2 pM - 180 pM) of substrate peptides were co-incubated with 16.3 pM 
nsp5. The reaction system was placed in Giernor black plate and the fluorescence was 
detected by a microplate reader (Molecular Devices) with Ex/Em (nm/nm) =340/490. 
Relative Fluorescence Unit (RFU) was collected every 30 sec for 2 hours. 

Calculation of Michaelis constants. The initial slope (slope A = RFU/min) was 
generated from the linear interval of the rising stage. Then, a linear equation was 
generated using the RFU at plateau (RFU max ) vs. the concentration of substrate. The 
slope (Slope B = RFU/[S]) indicates the RFU change at per unit change of [S]. The 
initial reaction velocity (Vo = [S]/min) was calculated through dividing slope A by 
slope B. The Michaelis-Menten kinetic constants were generated by Lineweaver-Burk 
plot. 
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Results 

The coronavirus 3CLpros and their cleavage sites are evolutionarily conserved 
among different genera. 

To study the genetic diversity and evolution of 3CLpro cleavage sites of 
coronaviruses ppla/lab, 308 primary sequences of 3CLpro cleavage sites (ranging 
from P4 to P2') of 28 species of coronaviruses were collected and listed in Tables 
S1-S4, including the predicted and verified cleavage sites. 11 canonical cleavage sites 
of each coronavirus were joined end to end to produce a spliced sequence which was 
then used to produce a phylogenetic tree (Fig. 1A). In addition, the sequences of all 
coronavirus 3CLpro were used to generate another phylogenetic tree (Fig. IB). The 
analyses showed that the phylogenetic distances and taxonomic positions of each 
virus, in both phylogenetic trees, were mostly consistent with that classified by the 
International Committee on Taxonomy of Viruses (ICTV) 
(http://www.ictvonline.org/virusTaxonomy.asp). These results implied that the 
cleavage sites of coronaviral 3CLpros might co-evolve with 3CLpros, and the genetic 
diversity of both 3CLpro and its cleavage sites are relatively conserved between 
different genera of coronaviruses. However, on the phylogenetic tree generated with 
3CLpro cleavage sites (Fig. 1A), the members of the genus Gammacoronavirus, 
although clustered closely, is split into alphacoronaviruses and deltacoronaviruses, 
suggesting that the cleavage sites of gammacoronaviruses may have undergone 
recombination events during evolution. 

Setup of the predicting conditions of coronaviruses 3CLpro cleavage sites. 

In order to develop an optimized method for cleavage site prediction that can cover all 
possible cleavage sites with fewer false positives, we have set three levels of criteria 
(stringent, moderate and mild) for cleavage site prediction. In the stringent rules, 
3CLpro cleavage sites only comprise the most preferred residues at each position 
based on previous description (Chuck et al., 2011). In moderate rules, 3CLpro 
cleavage sites comprise residues which ever appeared in the cleavage sequences of 
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congeneric coronaviruses at each particular position. As for mild rules, the cleavage 
sites could comprise any residues ever found in the cleavage sequences of all 
coronaviruses at each particular position. Because the substrate preference at P4 and 
P2' is not strong, we decided to adopt two different lengths of cleavage sequences for 
prediction, one containing 6 residues from position P4 to P2', and the other containing 
4 residues from position P3 to PI'. These two lengths of cleavage sequences, 
combining with the three different criteria, made up a total of six search conditions for 
cleavage site predication with decreasing degree of stringency. The canonical 
cleavage sites of 3CLpro for these 7 groups of coronaviruses were summarized in 
table S1-S4 and used to set conditions III to VI. Possible residues at each particular 
position of 3CLpro cleavage sites were predicted based on all six conditions to make 
the cleavage site profile of coronaviruses 3CLpro (table S5). In principle, when 
condition I was employed, the least number of possible cleavage sites were identified 
in a scanned sequence, while condition VI predicted the largest number of possible 
cleavage sites in a scanned sequence. 

To the applicability, we applied all the six conditions on 5 representative 
coronaviruses, including HCoV 229E from alphacoronavirus, MHV from 
betacoronavirus lineage A, SARS-CoV from beta coronavirus lineage B, MERS-CoV 
from betacoronavirus lineage C and IBV from gammacoronavirus. All possible 
cleavage sites predicted based on each condition were scanned on ppla/lab of five 
representative coronaviruses and the results were summarized in Table 1. As shown in 
Table 1, increasing numbers of cleavages sites were found for each coronavirus when 
conditions from I to VI were applied. The results showed that condition I and II were 
too strict to cover all 11 canonical cleavages sites; condition V and VI were too loose 
so as to produce 2-3 times more than 11 cleavages sites; condition III could only 
cover the canonical cleavage sites for SARS CoV; only condition IV generates an 
appropriate number of cleavage sites for all 5 coronavirus. Therefore, search condition 
IV was chosen for further analysis of the cleavage sites of MERS-CoV. 
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249 

250 By applying the search condition IV, 9 putative cleavage sites (PSs) as well as 11 

251 canonical cleavage sites (CSs) were predicted (Table 2). Although the canonical 

252 cleavage sites of MERS-CoV 3CLpro have been predicted by sequence alignment 

253 w ith other coronavirus (van Boheemen et al., 2012), our results suggested that the 

254 additional cleavage might occur in the process of MERS-CoV ppla/lab processing. 

255 

256 Activity of MERS-CoV 3CLpro in biochemical assays . 

257 To verily the activity of MERS-CoV 3CLpro and cleavability of the predicted 

258 cleavage sites, the biochemical assay systems of MERS-CoV 3CLpro were 

259 established. As shown in Fig. 2A and 2B, we first expressed and purified MERS-CoV 

260 3CLpro (nsp5) with different tags and mutation: N-terminally GST-tagged nsp5 

261 (Gnsp5, 60.4 kDa), N-terminally His-tagged (34 extra amino acids with 6 x His tag and 

262 linker provided by vector pET-28a) nsp5 (Hnsp5, 36.9 kDa), Hnsp5 with catalytic 

263 residue mutation C148A (Hnsp5m, 36.9 kDa) (Kilianski et al., 2013) and GST 

264 tag-GVLQ-nsp5 with C148A mutation and 6 x His tag (Gnsp5mH, 61.6 kDa), in which 

265 the sequence motif GVLQ represents the last four residues of MERS-CoV nsp4, 

266 mimicking the cleavage site of MERS-CoV nsp4/nsp5. In the biochemical assays, the 

267 Gnsp5mH with catalytic residue mutation Cl48A could not undergo self-cleavage at 

268 the cleavage site to release GST in incubation for 16 hours (Fig. 2C), indicating that 

269 the 3CLpro activity of MERS-CoV nsp5 in Gnsp5mH was inactivated by the mutation 

270 Cl48A. Thus, Gnsp5mH was used as protease substrate in the following biochemical 

271 assays. To verify the 3CLpro activity of recombinant nsp5s, Gnsp5 and Hnsp5 were 

272 incubated with substrate Gnsp5mH for 5 minutes to 16 hours and analyzed by 

273 SDS-PAGE (Fig. 2D) and Western blotting, respectively (Fig. 2E). Both Gnsp5 and 

274 Hnsp5 showed the proteolysis activity to cleave the substrate Gnsp5mH into two parts: 

275 GST (26.0 kDa) and nsp5mH (34.1 kDa), which were confirmed by the correlation of 

276 their molecular weight (Figs. 2D and 2E). However, the 3CLpro activity of Gnsp5 

277 was obviously weaker than that of Hnsp5, which could entirely cleave the substrate 
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278 Gnsp5mH 2 hours post treatment (Figs. 2D and 2E). These results could be explained 

279 by that the larger fusion tag at the N terminus of MERS-CoV 3CLpro significantly 

280 reduced the proteolysis activity of 3CLpro, which was consistent with the previous 

281 observation (Xue et ah, 2007). In the biochemical assays, the relatively lower 

282 proteolysis activity of 3CLpro will benefit to observe the influence of different 

283 substrates. Therefore, both recombinant Gnsp5 and Hnsp5 were used as MERS-CoV 

284 3CLpro in the following studies. 

285 

286 Identification of the cleavability of predicted cleavage sites in MERS-CoV 

287 ppla/lab. 

288 To rapidly evaluate the proteolysis activity of MERS-CoV 3CLpro towards the 

289 predicted cleavage sites of different substrates, a sensitive luciferase-based biosensor 

290 assay was adopted. As shown in Fig. 3A, the canonical cleavage sites (CS) of 

291 MERS-CoV nsp4/nsp5 (CS4/5) and nsp5/nsp6 (CS5/6), which were experimentally 

292 confirmed in a previous study (Kilianski et al., 2013), were inserted into the inverted 

293 and circularly permuted luciferase construct pGlo-lOF, in which the N-terminal and 

294 C-terminal halves of luciferase gene are separated. The resulting luciferase in 

295 translation system in vitro was inactive and could convert into an active luciferase 

296 when cleaved by recombinant viral protease at the engineered cleavage sites (such as 

297 CS4/5 and CS5/6). In this system, the luciferase signals were detected when incubated 

298 with both Gnsp5 and Hnsp5, respectively (Fig. 3B). In contrast, the mutated nsp5 

299 (Hnsp5m) could not convert the inactive luciferase into active form (Fig. 3B). This 

800 result indicated that the luciferase-based biosensor assay could be used to evaluate the 

301 proteolysis activity of MERS-CoV 3CLpro. Then, the other 9 canonical cleavage sites 

302 and 9 putative cleavage sites composed with 8 aa from MERS-CoV ppla/lab were 

303 inserted into the luciferase construct pGlo-lOF, and the luciferase-based biosensor 

304 assays were performed using Hnsp5 and Hnsp5m, respectively. As shown in Fig. 3C, 

305 all the 11 canonical cleavage sites of MERS-CoV 3CLpro generated luciferase signal 

306 by Hnsp5 at least 6.6 times higher than by the inactive Hnsp5m, indicating that all 
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these canonical sites could be cleaved by MERS-CoV 3CLpro. These results 
experimentally verified the existence of the 11 predicted canonical cleavage sites. 
Interestingly, among the 9 putative cleavage sites, the luciferase signals of PS 1-1, 
PS3-1 and PS3-3 remarkably increased more than 70 folds when incubated with 
Hnsp5, indicating that the putative cleavage sites, located inside nspl and nsp3 of 
MERS-CoV respectively, might be cleavable (Fig. 3D). The other 6 predicted putative 
sites (PS3-2, PS5-1, PS6-1, PS12-1, PS 13-1, and PS16-1) showed less than 2.5 folds 
increase of luciferase signal when they were treated by Hnsp5 comparing with those 
treated by Hnsp5m (Fig. 3C and 3D). Due to high sensitivity of the luciferase-based 
biosensor assay and the fact that the confirmed canonical cleavage sites generated at 
least 6.6 times increase of luciferase signal, the cleavage signal of these six sites may 
represent the background level, indicating that they are likely uncleavable per se. 
These results suggest that previously unrecognized 3CLpro cleavage sites may exist 
inside the nsps, which were regarded as non-canonical cleavage sites. 

Analysis of the substrate specificity of MERS-CoV 3CLpro. 

The substrate specificity of coronaviruses 3CLpro is determined by the residues from 
P4 to P2' positions of cleavage sites, especially depending on the PI, P2 and PE 
positions, which would benefit the prediction of cleavage site and design the 
broad-spectrum inhibitors of coronaviruses 3CLpro (Chuck et al., 2011; Hegyi and 
Ziebuhr, 2002). Previous studies demonstrated that different canonical cleavage sites 
of some representative coronaviruses are not equally susceptible to proteolysis by 
recombinant 3CLpro (Fan et al., 2004; Hegyi and Ziebuhr, 2002). To define the 
susceptibility of the canonical cleavage sites and substrate specificity of MERS-CoV 
3CLpro, 20-mer synthetic peptides representing corresponding canonical cleavage 
sites of MERS-CoV 3CLpro were synthesized and modified with N-terminal 
DABCYL and C-terminal Glu-EDANS (Fig. 4A). The fluorophore EDANS and 
quencher DABCYL are widely used in the biochemical assays based on the 
fluorescence resonance energy transfer (FRET). As shown in Fig. 4B, the peptides 

13 


Page 13 of 32 



ACCEPTED MANUSCRIPT 


336 

337 

338 

339 

340 

341 

342 

343 

344 

345 

346 

347 

348 

349 

350 

351 

352 

353 

354 

355 

356 

357 

358 

359 

360 

361 

362 

363 

364 


represented cleavage sites CS4/5 and CS5/6 were tested to optimize the FRET assay, 
and the relative fluorescence unit (RFU) folds of both sites significantly increased 
when incubated with Gnsp5 and Hnsp5. Although the FRET assay system is more 
costly and less sensitive than the luciferase-based biosensor assay (Figs. 3B and 4B), 
it provides continuous read signals during the process of reaction, which could 
measure the kinetic characteristic of protease towards different substrates. The initial 
reaction rate (RFU/min) of all 11 canonical cleavage sites of MERS-CoV were 
measured and shown in Fig. 4C. The Michaelis constants including kcat, Km, 
kcat/Km and relative kcat/Km were then calculated (Table 3). As shown in Table 3, 
the substrate specificity of MERS-CoV 3CLpro is relatively conserved with other 
coronaviruses as previously reported (Fan et al., 2004; Hegyi and Ziebuhr, 2002; 
Ziebuhr and Siddell, 1999). The relative kcat/Km values of CS4/5 and CS5/6 
indicated that the cleavage sites flanking MERS-CoV 3CLpro are converted 
significantly faster than other sites. The efficient proteolysis at the sites flanking nsp5 
implies that the nsp5 (3CLpro) might be released from the polyprotein la/lab at the 
very early stage of the maturation of viral nsps, which is similar with the HCoV, 
TGEV, SARS-CoV and MHV (Fan et al., 2004; Hegyi and Ziebuhr, 2002). However, 
the relative kcat/Km value of CS4/5 is lower than that of CS5/6 (Table 3), which is 
different from that of the coronaviruses (Fan et al., 2004; Hegyi and Ziebuhr, 2002). 
This could be explained by that the residue Gly (G) at the P4 of cleavage site between 
nsp4 and nsp5 of MRES-CoV reduces the protease activity of 3CLpro comparing with 
the residues Ser (S), Ala (A) and Thr (T) of other coronaviruses (Tables S1-S4) as 
previous described (Chuck et al, 2011). Whether such disparity plays any role in the 
replication and pathogenesis of MERS-CoV is unknown. 

Discussion 

The processing of viral polyprotein by 3CLpro is essential for the replication of 
coronaviruses. Besides the 11 canonical cleavage sites of coronaviruses, some 
additional cleavage sites inside nsps, so called non-canonical cleavage sites, have also 
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been identified (Deming et al., 2007). Therefore, more non-canonical 3CLpro 
cleavage sites are to be identified in different coronaviruses. In this study, we 
designed 6 search conditions for predicting 3 Cipro cleavage sites, among which, the 
search condition IV provides a feasible way to reveal the potential cleavage sites of 
3CLpro within coronaviruses. Based on the genetic diversity of different coronavirus 
genera (Fig. 1), the scanning condition IV adopted the residues of 3CLpro cleavage 
sites, which ever appeared in the cleavage sequences of congeneric coronaviruses at 
position P3 to PI'. In contrast, conditions I, II, III, V and VI were either too restrictive 
or generated too many false positive outcomes (Table 1). In the suggested condition 
IV, 4 residues from position P3 to PI' were applied to the prediction of 3CLpro 
cleavage site. By measuring the relative protease activities of 3CLpro from different 
coronavirus genera against 19 amino acids x 8 positions of substrate variants, it is 
shown that the substrate specificity of position P5, P2' and P3' are significantly lower 
than other positions (Chuck et al., 2011). Therefore, the consideration of 6 or more 
residues is unnecessary, which could lead to leave-out of potential cleavage sites 
(Table 1). Comparing with the previous researches on the prediction and identification 
of 3CLpro cleavage sites, the scanning condition IV showed its advantages. For 
example, the two nonexistent putative cleavage sites predicted within nsp6 of IBV 
(Gorbalenya et al., 1989; Liu et al., 1997; Ng and Liu, 1998) were avoided in our 
prediction method (data not shown). Notably, the noncanonical cleavage site at the 
end of MHV nsp7 identified by Deming et al. could be predicted using scanning 
condition IV. 

By using the search condition IV, 9 putative cleavage sites were predicted in 
MERS-CoV pplab in addition to the 11 canonical cleavage sites. The luciferase signal 
of CS10/12 increased 6.6 fold when treated with nsp5 in the recombinant luciferase 
cleavage assays, which is the lowest among the 11 canonical cleavage sites (Fig. 3C). 
Therefore, the 6.6 fold increase of luciferase signal was used arbitrarily as a threshold 
for judging positive and negative. Among the 9 predicted putative cleavage sites, three 
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sites (PSl-1, PS3-1 and PS3-3) showed obviously increasing signals at least 70 times 
above the background (Fig. 3D) and therefore were regarded as cleavable sites. The 
increase of signals of other 6 predicted putative cleavage sites was less than 2.5 times 
(Fig. 3D). Therefore, they were regarded as non-cleavable sites and thus as false 
positives from the prediction. Interestingly, the homologous sequence of PS 1-1 and 
PS3-1 are conserved in lineage C of betacoronavirus including MERS-CoV, BatCoV 
HKU4 and BatCoV HKU5 (Figs. 5A and 5B). However, PS3-3 is MERS-CoV unique 
sequence (Fig. 5C). Moreover, the cleavability of a cleavage site in biochemical 
assays is a necessary but not sufficient condition for its physiological existence in the 
viral infection. A predicted cleavage site may or may not be accessible by a protease. 
The 3D structure model of MERS-CoV ADP-ribose-1-monophosphatase (ADRP) 
domain built by comparative protein modeling and papain like protease (PLpro) 
domain (Bailey-Elkin et al., 2014) showed that both PS3-1 and PS3-3 are located at 
the surface of ADRP and PLpro domain, opposite to the enzymatic active centers 
(Figs. 5D and 5E), suggesting that these two sites are like approachable by the 
proteinase. Most recently, the crystal structure of MERS-CoV 3CLpro was 
determined (Needle et al., 2015). Although PS5-1 is also located at the surface of 
MERS-CoV 3CLpro, the self-cleavage of MERS-CoV nsp5 was not observed in this 
study (Fig. 2). Therefore, the threshold we proposed in the luciferase-based biosensor 
system to exclude the false positive prediction results is reasonable (Fig. 3D). 
However, further studies are needed to identify the predicted cleavage products from 
the cells infected by MERS-CoV. Currently, such work with live MERS-CoV is 
limited in our research facilities due to the biosafety rules, but it can be addressed in 
collaboration in the future. 

Notably, the outcomes of the two cleavage assay systems were different. The signal 
fold change of highly sensitive luciferase-based biosensor assay is dependent on the 
accumulation of active luciferase cleaved by nsp5 during 1 hour (Materials and 
Methods section), while the outcome of the FRET assay is instant relative 
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fluorescence unit (RFU) signal. The RFU/min is the initial speed of the reaction, 
which reflects but not equals to the efficiency of the cleavage. These differences may 
be caused by the steric hindrance of the luciferase subunits, the distance between 
fluorophore and quencher of substrates for FRET assay and substrate solubility. 
Therefore, the activity observed in the two different systems cannot be compared 
directly. Based on the characteristic of the two cleavage assay systems, the highly 
sensitive luciferase-based biosensor assay might be more suitable to high throughput 
screen the predicted putative cleavage site of protease while the FRET assay better for 
cleavage kinetic analysis. 

According to the Michaelis constants of MERS-CoV, the substrate specificity of 
MERS-CoV 3CLpro is relatively conserved with other coronaviruses (Fan et ah, 2004; 
Hegyi and Ziebuhr, 2002). Notably, the Pro (P) has been selected as result of 
evolution at position P2 of cleavage site between nsplO and nspl2 (CS10/12) of 
lineage C betacoronavirus, which is not preferred by the 3CLpro based on the 
previous study (Chuck et ah, 2011). However, the relative kcat/Km value of 
MERS-CoV CS10/12 is 0.053, which is 26.5 fold higher than that of SARS-CoV (Fan 
et ah, 2004). This indicated that the substrate preferences of some cleavage sites could 
still be varied among different genera of coronaviruses and the proposed scanning 
condition IV regarding the residues ever appearing in the cleavage sequences of 
congeneric coronaviruses is reasonable. 

In summary, we proposed an optimized search condition for predicting cleavage sites 
of coronavirus 3CLpro. We verified the 11 canonical cleavage sites of pplab in 
biochemical assays. We further identified 3 non-canonical cleavage sites in the nsps of 
MERS-CoV. The results provide clues for possible identification of novel cleavage 
products of coronavirus nsps and will benefit the studies of the mechanisms of 
coronavirus replication. 
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Conclusions 

Processing of polyprotein la/lab by 3CLpro is essential in coronavirus life cycle. The 
3CLpro cleavage site prediction methods established by previous studies are focus on 
the accuracy, while some noncanonical cleavage sites were missed. In this study, we 
built a moderate prediction method to balance the accuracy and false positive 
outcomes. Using this method, 9 putative cleavage sites, in addition to the 11 canonical 
sites, were predicted in MERS-CoV pplab and the cleavability of 3 of them was 
experimentally confirmed. Interestingly, all these 3 non-canonical cleavage sites are 
located upstream to nsp4, which is in contrast with previous understanding that the 
coronavirus 3CL protease only cleaves from nsp4 to nspl6. This suggests a novel role 
of 3CLpro in coronavirus ppla/lab processing. However, the cleavability of these 
putative cleavage sites needs to be further verified in the viral proteins of 
MERS-CoV-infected cells. Finally, the catalytic constants of the 11 canonical 
cleavage sites of MERS-CoV 3CLpro showed its conservation with the cousins in 
Coronaviridae. 
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563 Figure legends 

564 pig. 1. The phylogenetic tree of 26 representative coronaviruses. (A) The tree was 

565 generated using an alignment of the joined canonical cleavage sites of 26 

566 coronaviruses. Sequence alignment was performed by ClustalX 2.0, and the tree was 

567 built by neighbor-joining method in MEGA 4 (Bootstrap: replication = 1000, random 

568 seec i = 64238). (B) The tree was generated by the sequence of nsp5 and the method is 

569 the same as described above. 

570 

571 Fig. 2. Purification of recombinant nsp5 of MERS-CoV and analysis of substrate 

572 cleavage by protein cleavage assays. (A) Diagram of 4 recombinant proteins. The 

573 catalytic residue mutation C148A is indicated by a small black triangle. GVLQ 

574 (P4-P1) are the last four residues of MERS-CoV nsp4. The insertion of these 4 

575 residues made the N-terminal GST tag cleavable by active nsp5. The cleavage 

576 position was indicated by a down arrow. (B) SDS-PAGE analysis of the recombinant 

577 proteins. After all of the proteins were purified, they were concentrated to 1 mg/ml. 2 

578 gg Gnsp5mH, Hnsp5, Hnsp5m and lpg Gnsp5 were loaded to a 10% SDS PAGE gel 

579 and stained with Coomassie brilliant blue. (C) and (D) Gnsp5mH was incubated in 50 

580 rnM Tris, pH 7.5, 1 mM EDTA, and 50 pM DTT at 37°C alone (C) or with Gnsp5 and 

581 Hnsp5 (D). The substrate protein was diluted to 0.1 mg/ml. A fraction of the reaction 

582 mixture was taken out at each time point (0 min, 5 min, 1 h, 2 h, 4 h, 16 h) and 

583 analyzed by 10% SDS-PAGE. Products were detected by CBB staining (D) and 

584 Western blot (E). 

585 

586 Fig 3. Identification of the cleavability of predicted cleavage sites in recombinant 

587 luciferase cleavage assays. (A) Schematic diagram of the recombinant luciferase. (B) 

588 Verification of the recombinant luciferase assays. Inactive luciferase was synthesized 

589 i n the cell-free translation system and the reaction mixture incubated at 25 °C for 2 

590 hours. After that, the protein mixture was divided into four parts and incubated with 

591 1.63pM Gnsp5, Hnsp5, Hnsp5m or H 2 O, respectively. After incubation for 1 hour at 
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30 °C, the reaction product was diluted 20 times and mixed with equal amount of 
luciferase substrate. After incubation at room temperature for 5min, the luciferase 
luminescence was measured. Luciferase activation fold was calculated through 
dividing the signal value of the reaction system treated with active Hnsp5 by the one 
treated with the inactive nsp5 mutant Hnsp5m. (C) The luciferase cleavage assay of 
predicted 11 canonical cleavage sites and (D) 9 putative cleavage sites. The luciferase 
expression vector inserted with cleavage sites were added to the wheat germ protein 
translation mix and incubated at 25°C for 2 hours, and the reaction mixture was 
divided and treated with Hnsp5 and Hnsp5m, respectively. The dashed line indicates 
the lowest fold increase of luciferase signal by cleavage of previously confirmed 
3CLpro cleavage sites. The data presented here are the mean values± SD derived from 
three independent experiments. 

Fig. 4. Kinetic analyses of the 11 canonical cleavage sites cleaved by MERS-CoV 
nsp5 by FRET assays. (A) Diagram of the FRET mechanism. EDANS transfer its 
490 nm energy to DABCYL at the excitation of 340 nm, making the emission 
undetected. After the peptide bond between PI and PE was cut off by nsp5, the 
separation disabled the energy transferring and the 490 nm emission of EDANS can 
be detected. (B) 180 pM synthesized peptide was incubated with 16.3 pM tagged nsp5. 
After incubation for 2 hours at 37°C, the fluorescence (Ex/Em=340nm/490nm) was 
read by a luminometer. (C) The rate of RFU rise (Slope A = RFU/min) at the linear 
interval right after the reaction began. The data presented here are the mean values± 
SD derived from three independent experiments. 

Fig. 5. Conservation analysis and the spatial location of the novel noncanonieal 
cleavage sites. The sequence alignment of nsp region covering PS 1-1 site (A), PS3-1 
site (B) and PS3-3 site (C). The cleavage sites of MERS-CoV were indicated by black 
boxes. (D) Flomology modeled structure of ADRP domain of MERS-CoV (template: 
2FAV). ADRP domain was shown as green ribbon. The putative cleavage site was 
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621 colored in red and the cleavage position Gin was showed by stick. The substrate 

622 (ADP-ribose) of ADRP domain was shown by stick and colored by atoms (C: cyan, O: 

623 red, ]sj : blue, p : orange). (E) Structure of papain-like protease of MERS-CoV (4RF1). 

624 The PLpro domain was shown as cartoon and colored green. The ligand ubiquitin was 

625 colored cyan. The putative cleavage site was colored red and the cleavage position 

626 Gin was showed by stick. 

627 
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627 Tables 

628 Table 1. The number of cleavage sites in pplab of 5 representative coronaviruses 

629 predicted by using 6 search conditions 



HCoV 22 9E 

MHV 

SARS-CoV 

MERS-CoV 

IBV 


CS a 

PS b 

CS 

PS 

CS 

PS 

CS 

PS 

CS 

PS 

Condition I c 

1 

0 

2 

0 

2 

0 

2 

0 

1 

0 

Condition II 

1 

0 

4 

1 

3 

0 

4 

0 

2 

0 

Condition Ill 

11 

4 

11 

5 

11 

0 

11 

2 

11 

3 

Condition IV 

11 

10 

11 

14 

11 

4 

11 

9 

11 

5 

Condition V 

11 

9 

11 

17 

11 

11 

11 

12 

11 

11 

Condition VI 

11 

15 

11 

23 

11 

19 

11 

19 

11 

13 


a Canonical cleavage sites, which are located between recognized nsps. 
b Putative cleavage sites, which are located inside various nsps. 
c Six search conditions are designed: Conditions I, III & V cover 6 residues from P4 to 
P2'; Conditions II, IV & VI cover 4 residues from P3 to PI'. Conditions I and II are set 
to comprise the most preferred residues at each position; Conditions III and IV 
comprise residues appeared in the cleavage sites of congeneric coronaviruses; 
Conditions V and VI comprise residues appeared in the cleavage sequences of any 
coronaviruses. 
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Table 2. The cleavage 
condition IV 

site prediction 

outcomes 

of MERS-CoV 

using search 

Canonical cleavage sites 


Putative cleavage sites 

Site 

Position 

Sequence a 

Site 

Position 

Sequence 

CS4/5 

3247 

GVLQfSG 

PS1-1 

122 

TTLQfGK 

CS5/6 

3553 

VVMQfSG 

PS3-1 

1191 

VLLQfGH 

CS6/7 

3845 

AAMQfSK 

PS3-2 

1278 

DIPQfSL 

CS7/8 

3928 

SVLQ/AT 

PS3-3 

1683 

VVLQfGL 

CS8/9 

4127 

VKLQfNN 

PS5-1 

3332 

HAMQfGT 

CS9/10 

4237 

VRLQfAG 

PS6-1 

3580 

IILQIAT 

CS10/12 

4377 

ALPQfSK 

PS12-1 

5076 

NILQfAT 

CS12/13 

5130 

TTLQIAV 

PS13-1 

5591 

VTVQiGP 

CS13/14 

5908 

YKLQfSQ 

PS16-1 

6793 

FKVQiNV 

CS 14/15 

6432 

TKVQfGL 




CS15/16 

6775 

PRLQfAS 





a The “I” indicates the cleavage position. 
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641 Table 3. The Michaelis constants of the 11 canonical cleavage sites of MERS-CoV 

642 3CLpro_ 



kcat (min' 1 ) 

Km (|iM) 

kcat/Km 
(mM' 1 min" 1 ) 

kcat/Km 

(rel) 

P value a 

CS4/5 

0.3053±0.05661 

75.89±17.57 

4.023 

1 

- 

CS5/6 

0.6811±0.1388 

88.25±18.19 

7.717 

1.9 

0.015 

CS6/7 

0.2993±0.04865 

264.7±36.95 

1.131 

0.28 

<0.0001 

CS7/8 

2.073±0.5245 

321.89±97.63 

6.441 

1.6 

0.011 

CS8/9 

0.5161±0.04468 

423.1±27.32 

1.220 

0.30 

<0.0001 

CS9/10 

2.390±0.2397 

833.8±182.1 

2.866 

0.71 

0.103 

CS10/12 

0.1152±0.02049 

534.9±91.71 

0.2154 

0.053 

<0.0001 

CS12/13 

0.1083±0.002443 

83.90±3.949 

1.290 

0.32 

<0.0001 

CS13/14 

0.1815±0.0200 

449.7±1.996 

0.4036 

0.10 

<0.0001 

CS14/15 

0.05115±0.00878 

207.2±59.61 

0.2469 

0.061 

<0.0001 

CS15/16 

0.3849±0.01126 

100.7±6.473 

3.823 

0.95 

0.58 


643 a p va i ue W as statistically analyzed by unpaired Students’s /-test. 
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